Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
1.
Sci Data ; 10(1): 126, 2023 03 09.
Article in English | MEDLINE | ID: covidwho-2263639

ABSTRACT

Understanding the scope, prevalence, and impact of the COVID-19 pandemic response will be a rich ground for research for many years. Key to the response to COVID-19 was the non-pharmaceutical intervention (NPI) measures, such as mask mandates or stay-in-place orders. For future pandemic preparedness, it is critical to understand the impact and scope of these interventions. Given the ongoing nature of the pandemic, existing NPI studies covering only the initial portion provide only a narrow view of the impact of NPI measures. This paper describes a dataset of NPI measures taken by counties in the U.S. state of Virginia that include measures taken over the first two years of the pandemic beginning in March 2020. This data enables analyses of NPI measures over a long time period that can produce impact analyses on both the individual NPI effectiveness in slowing the pandemic spread, and the impact of various NPI measures on the behavior and conditions of the different counties and state.


Subject(s)
COVID-19 , Humans , COVID-19/epidemiology , COVID-19/therapy , Data Curation , Pandemics , Policy , Virginia
2.
IEEE Trans Med Imaging ; 41(12): 3509-3519, 2022 Dec.
Article in English | MEDLINE | ID: covidwho-1909269

ABSTRACT

The recent success of learning-based algorithms can be greatly attributed to the immense amount of annotated data used for training. Yet, many datasets lack annotations due to the high costs associated with labeling, resulting in degraded performances of deep learning methods. Self-supervised learning is frequently adopted to mitigate the reliance on massive labeled datasets since it exploits unlabeled data to learn relevant feature representations. In this work, we propose SS-StyleGAN, a self-supervised approach for image annotation and classification suitable for extremely small annotated datasets. This novel framework adds self-supervision to the StyleGAN architecture by integrating an encoder that learns the embedding to the StyleGAN latent space, which is well-known for its disentangled properties. The learned latent space enables the smart selection of representatives from the data to be labeled for improved classification performance. We show that the proposed method attains strong classification results using small labeled datasets of sizes 50 and even 10. We demonstrate the superiority of our approach for the tasks of COVID-19 and liver tumor pathology identification.


Subject(s)
COVID-19 , Data Curation , Humans , Algorithms , Supervised Machine Learning
3.
Public Health Rep ; 137(2): 197-202, 2022.
Article in English | MEDLINE | ID: covidwho-1582752

ABSTRACT

The public health crisis created by the COVID-19 pandemic has spurred a deluge of scientific research aimed at informing the public health and medical response to the pandemic. However, early in the pandemic, those working in frontline public health and clinical care had insufficient time to parse the rapidly evolving evidence and use it for decision-making. Academics in public health and medicine were well-placed to translate the evidence for use by frontline clinicians and public health practitioners. The Novel Coronavirus Research Compendium (NCRC), a group of >60 faculty and trainees across the United States, formed in March 2020 with the goal to quickly triage and review the large volume of preprints and peer-reviewed publications on SARS-CoV-2 and COVID-19 and summarize the most important, novel evidence to inform pandemic response. From April 6 through December 31, 2020, NCRC teams screened 54 192 peer-reviewed articles and preprints, of which 527 were selected for review and uploaded to the NCRC website for public consumption. Most articles were peer-reviewed publications (n = 395, 75.0%), published in 102 journals; 25.1% (n = 132) of articles reviewed were preprints. The NCRC is a successful model of how academics translate scientific knowledge for practitioners and help build capacity for this work among students. This approach could be used for health problems beyond COVID-19, but the effort is resource intensive and may not be sustainable in the long term.


Subject(s)
COVID-19 , Data Curation/methods , Information Dissemination/methods , Interdisciplinary Research/organization & administration , Peer Review, Research , Preprints as Topic , SARS-CoV-2 , Humans , Public Health , United States
4.
Sci Data ; 8(1): 297, 2021 11 22.
Article in English | MEDLINE | ID: covidwho-1528020

ABSTRACT

The Covid Symptom Study, a smartphone-based surveillance study on COVID-19 symptoms in the population, is an exemplar of big data citizen science. As of May 23rd, 2021, over 5 million participants have collectively logged over 360 million self-assessment reports since its introduction in March 2020. The success of the Covid Symptom Study creates significant technical challenges around effective data curation. The primary issue is scale. The size of the dataset means that it can no longer be readily processed using standard Python-based data analytics software such as Pandas on commodity hardware. Alternative technologies exist but carry a higher technical complexity and are less accessible to many researchers. We present ExeTera, a Python-based open source software package designed to provide Pandas-like data analytics on datasets that approach terabyte scales. We present its design and capabilities, and show how it is a critical component of a data curation pipeline that enables reproducible research across an international research group for the Covid Symptom Study.


Subject(s)
COVID-19/epidemiology , Citizen Science , Data Curation , Big Data , Data Science , Datasets as Topic , Epidemiological Monitoring , Humans , Mobile Applications , Smartphone , Software
5.
Nucleic Acids Res ; 50(D1): D687-D692, 2022 01 07.
Article in English | MEDLINE | ID: covidwho-1522256

ABSTRACT

The Reactome Knowledgebase (https://reactome.org), an Elixir core resource, provides manually curated molecular details across a broad range of physiological and pathological biological processes in humans, including both hereditary and acquired disease processes. The processes are annotated as an ordered network of molecular transformations in a single consistent data model. Reactome thus functions both as a digital archive of manually curated human biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. Recent curation work has expanded our annotations of normal and disease-associated signaling processes and of the drugs that target them, in particular infections caused by the SARS-CoV-1 and SARS-CoV-2 coronaviruses and the host response to infection. New tools support better simultaneous analysis of high-throughput data from multiple sources and the placement of understudied ('dark') proteins from analyzed datasets in the context of Reactome's manually curated pathways.


Subject(s)
Antiviral Agents/pharmacology , Knowledge Bases , Proteins/metabolism , COVID-19/metabolism , Data Curation , Genome, Human , Host-Pathogen Interactions , Humans , Proteins/genetics , Signal Transduction , Software
6.
Nucleic Acids Res ; 50(D1): D1282-D1294, 2022 01 07.
Article in English | MEDLINE | ID: covidwho-1493886

ABSTRACT

The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb; www.guidetopharmacology.org) is an open-access, expert-curated database of molecular interactions between ligands and their targets. We describe expansion in content over nine database releases made during the last two years, which has focussed on three main areas of infection. The COVID-19 pandemic continues to have a major impact on health worldwide. GtoPdb has sought to support the wider research community to understand the pharmacology of emerging drug targets for SARS-CoV-2 as well as potential targets in the host to block viral entry and reduce the adverse effects of infection in patients with COVID-19. We describe how the database rapidly evolved to include a new family of Coronavirus proteins. Malaria remains a global threat to half the population of the world. Our database content continues to be enhanced through our collaboration with Medicines for Malaria Venture (MMV) on the IUPHAR/MMV Guide to MALARIA PHARMACOLOGY (www.guidetomalariapharmacology.org). Antibiotic resistance is also a growing threat to global health. In response, we have extended our coverage of antibacterials in partnership with AntibioticDB.


Subject(s)
Anti-Bacterial Agents/pharmacology , Antimalarials/pharmacology , Antiviral Agents/pharmacology , COVID-19 Drug Treatment , Anti-Bacterial Agents/chemistry , COVID-19/etiology , Data Curation , Databases, Pharmaceutical , Humans , Ligands , Malaria/drug therapy , Malaria/metabolism , User-Computer Interface , Viral Proteins/chemistry , Viral Proteins/metabolism
7.
Nucleic Acids Res ; 49(W1): W352-W358, 2021 07 02.
Article in English | MEDLINE | ID: covidwho-1317922

ABSTRACT

Searching and reading relevant literature is a routine practice in biomedical research. However, it is challenging for a user to design optimal search queries using all the keywords related to a given topic. As such, existing search systems such as PubMed often return suboptimal results. Several computational methods have been proposed as an effective alternative to keyword-based query methods for literature recommendation. However, those methods require specialized knowledge in machine learning and natural language processing, which can make them difficult for biologists to utilize. In this paper, we propose LitSuggest, a web server that provides an all-in-one literature recommendation and curation service to help biomedical researchers stay up to date with scientific literature. LitSuggest combines advanced machine learning techniques for suggesting relevant PubMed articles with high accuracy. In addition to innovative text-processing methods, LitSuggest offers multiple advantages over existing tools. First, LitSuggest allows users to curate, organize, and download classification results in a single interface. Second, users can easily fine-tune LitSuggest results by updating the training corpus. Third, results can be readily shared, enabling collaborative analysis and curation of scientific literature. Finally, LitSuggest provides an automated personalized weekly digest of newly published articles for each user's project. LitSuggest is publicly available at https://www.ncbi.nlm.nih.gov/research/litsuggest.


Subject(s)
Publications , Software , COVID-19 , Data Curation , Healthcare Disparities , Humans , Internet , Liver Neoplasms/epidemiology , Machine Learning
8.
Nat Commun ; 12(1): 2017, 2021 04 01.
Article in English | MEDLINE | ID: covidwho-1164850

ABSTRACT

In the electronic health record, using clinical notes to identify entities such as disorders and their temporality (e.g. the order of an event relative to a time index) can inform many important analyses. However, creating training data for clinical entity tasks is time consuming and sharing labeled data is challenging due to privacy concerns. The information needs of the COVID-19 pandemic highlight the need for agile methods of training machine learning models for clinical notes. We present Trove, a framework for weakly supervised entity classification using medical ontologies and expert-generated rules. Our approach, unlike hand-labeled notes, is easy to share and modify, while offering performance comparable to learning from manually labeled training data. In this work, we validate our framework on six benchmark tasks and demonstrate Trove's ability to analyze the records of patients visiting the emergency department at Stanford Health Care for COVID-19 presenting symptoms and risk factors.


Subject(s)
COVID-19 , Data Curation/methods , Expert Systems , Machine Learning , Datasets as Topic , Electronic Health Records , Humans , Natural Language Processing , SARS-CoV-2
9.
Nucleic Acids Res ; 49(D1): D589-D599, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-1117395

ABSTRACT

PAGER-CoV (http://discovery.informatics.uab.edu/PAGER-CoV/) is a new web-based database that can help biomedical researchers interpret coronavirus-related functional genomic study results in the context of curated knowledge of host viral infection, inflammatory response, organ damage, and tissue repair. The new database consists of 11 835 PAGs (Pathways, Annotated gene-lists, or Gene signatures) from 33 public data sources. Through the web user interface, users can search by a query gene or a query term and retrieve significantly matched PAGs with all the curated information. Users can navigate from a PAG of interest to other related PAGs through either shared PAG-to-PAG co-membership relationships or PAG-to-PAG regulatory relationships, totaling 19 996 993. Users can also retrieve enriched PAGs from an input list of COVID-19 functional study result genes, customize the search data sources, and export all results for subsequent offline data analysis. In a case study, we performed a gene set enrichment analysis (GSEA) of a COVID-19 RNA-seq data set from the Gene Expression Omnibus database. Compared with the results using the standard PAGER database, PAGER-CoV allows for more sensitive matching of known immune-related gene signatures. We expect PAGER-CoV to be invaluable for biomedical researchers to find molecular biology mechanisms and tailored therapeutics to treat COVID-19 patients.


Subject(s)
Algorithms , COVID-19/prevention & control , Computational Biology/methods , Coronavirus/genetics , Databases, Genetic , SARS-CoV-2/genetics , COVID-19/epidemiology , COVID-19/virology , Coronavirus/metabolism , Data Curation/methods , Epidemics , Gene Regulatory Networks , Humans , Information Storage and Retrieval/methods , Internet , Molecular Sequence Annotation/methods , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , User-Computer Interface
10.
Nucleic Acids Res ; 49(D1): D1152-D1159, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-1117392

ABSTRACT

The current state of the COVID-19 pandemic is a global health crisis. To fight the novel coronavirus, one of the best-known ways is to block enzymes essential for virus replication. Currently, we know that the SARS-CoV-2 virus encodes about 29 proteins such as spike protein, 3C-like protease (3CLpro), RNA-dependent RNA polymerase (RdRp), Papain-like protease (PLpro), and nucleocapsid (N) protein. SARS-CoV-2 uses human angiotensin-converting enzyme 2 (ACE2) for viral entry and transmembrane serine protease family member II (TMPRSS2) for spike protein priming. Thus in order to speed up the discovery of potential drugs, we develop DockCoV2, a drug database for SARS-CoV-2. DockCoV2 focuses on predicting the binding affinity of FDA-approved and Taiwan National Health Insurance (NHI) drugs with the seven proteins mentioned above. This database contains a total of 3,109 drugs. DockCoV2 is easy to use and search against, is well cross-linked to external databases, and provides the state-of-the-art prediction results in one site. Users can download their drug-protein docking data of interest and examine additional drug-related information on DockCoV2. Furthermore, DockCoV2 provides experimental information to help users understand which drugs have already been reported to be effective against MERS or SARS-CoV. DockCoV2 is available at https://covirus.cc/drugs/.


Subject(s)
Antiviral Agents/therapeutic use , COVID-19 Drug Treatment , Databases, Pharmaceutical/statistics & numerical data , SARS-CoV-2/drug effects , Antiviral Agents/metabolism , COVID-19/epidemiology , COVID-19/virology , Data Curation/methods , Data Mining/methods , Humans , Internet , Models, Molecular , Pandemics , Protein Binding/drug effects , Protein Domains , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Viral Proteins/chemistry , Viral Proteins/metabolism , Virus Replication/drug effects
11.
Nucleic Acids Res ; 49(D1): D1534-D1540, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-1117391

ABSTRACT

Since the outbreak of the current pandemic in 2020, there has been a rapid growth of published articles on COVID-19 and SARS-CoV-2, with about 10,000 new articles added each month. This is causing an increasingly serious information overload, making it difficult for scientists, healthcare professionals and the general public to remain up to date on the latest SARS-CoV-2 and COVID-19 research. Hence, we developed LitCovid (https://www.ncbi.nlm.nih.gov/research/coronavirus/), a curated literature hub, to track up-to-date scientific information in PubMed. LitCovid is updated daily with newly identified relevant articles organized into curated categories. To support manual curation, advanced machine-learning and deep-learning algorithms have been developed, evaluated and integrated into the curation workflow. To the best of our knowledge, LitCovid is the first-of-its-kind COVID-19-specific literature resource, with all of its collected articles and curated data freely available. Since its release, LitCovid has been widely used, with millions of accesses by users worldwide for various information needs, such as evidence synthesis, drug discovery and text and data mining, among others.


Subject(s)
COVID-19/prevention & control , Data Curation/statistics & numerical data , Data Mining/statistics & numerical data , Databases, Factual , PubMed/statistics & numerical data , SARS-CoV-2/isolation & purification , COVID-19/epidemiology , COVID-19/virology , Data Curation/methods , Data Mining/methods , Humans , Internet , Machine Learning , Pandemics , Publications/statistics & numerical data , SARS-CoV-2/physiology
12.
Nucleic Acids Res ; 49(D1): D613-D621, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-1048364

ABSTRACT

WikiPathways (https://www.wikipathways.org) is a biological pathway database known for its collaborative nature and open science approaches. With the core idea of the scientific community developing and curating biological knowledge in pathway models, WikiPathways lowers all barriers for accessing and using its content. Increasingly more content creators, initiatives, projects and tools have started using WikiPathways. Central in this growth and increased use of WikiPathways are the various communities that focus on particular subsets of molecular pathways such as for rare diseases and lipid metabolism. Knowledge from published pathway figures helps prioritize pathway development, using optical character and named entity recognition. We show the growth of WikiPathways over the last three years, highlight the new communities and collaborations of pathway authors and curators, and describe various technologies to connect to external resources and initiatives. The road toward a sustainable, community-driven pathway database goes through integration with other resources such as Wikidata and allowing more use, curation and redistribution of WikiPathways content.


Subject(s)
Databases, Factual , COVID-19/pathology , Data Curation , Humans , Publications , User-Computer Interface
13.
Nucleic Acids Res ; 49(D1): D981-D987, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-1010406

ABSTRACT

The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the community model organism knowledgebase for the laboratory mouse, a widely used animal model for comparative studies of the genetic and genomic basis for human health and disease. MGD is the authoritative source for biological reference data related to mouse genes, gene functions, phenotypes and mouse models of human disease. MGD is the primary source for official gene, allele, and mouse strain nomenclature based on the guidelines set by the International Committee on Standardized Nomenclature for Mice. MGD's biocuration scientists curate information from the biomedical literature and from large and small datasets contributed directly by investigators. In this report we describe significant enhancements to the content and interfaces at MGD, including (i) improvements in the Multi Genome Viewer for exploring the genomes of multiple mouse strains, (ii) inclusion of many more mouse strains and new mouse strain pages with extended query options and (iii) integration of extensive data about mouse strain variants. We also describe improvements to the efficiency of literature curation processes and the implementation of an information portal focused on mouse models and genes for the study of COVID-19.


Subject(s)
COVID-19/prevention & control , Databases, Genetic , Genome/genetics , Genomics/methods , Knowledge Bases , SARS-CoV-2/genetics , Animals , COVID-19/epidemiology , COVID-19/virology , Data Curation/methods , Disease Models, Animal , Epidemics , Gene Ontology , Humans , Information Storage and Retrieval/methods , Internet , Mice , SARS-CoV-2/physiology
14.
Nucleic Acids Res ; 49(D1): D480-D489, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-944363

ABSTRACT

The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.


Subject(s)
Computational Biology/methods , Data Curation/methods , Databases, Protein , Knowledge Bases , Proteome/metabolism , Proteomics/methods , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , Humans , Internet , Molecular Sequence Annotation/methods , Pandemics , Proteome/genetics , SARS-CoV-2/genetics , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , User-Computer Interface , Viral Proteins/genetics , Viral Proteins/metabolism
15.
Nucleic Acids Res ; 49(D1): D1373-D1380, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-944361

ABSTRACT

The development of new drugs for diseases is a time-consuming, costly and risky process. In recent years, many drugs could be approved for other indications. This repurposing process allows to effectively reduce development costs, time and, ultimately, save patients' lives. During the ongoing COVID-19 pandemic, drug repositioning has gained widespread attention as a fast opportunity to find potential treatments against the newly emerging disease. In order to expand this field to researchers with varying levels of experience, we made an effort to open it to all users (meaning novices as well as experts in cheminformatics) by significantly improving the entry-level user experience. The browsing functionality can be used as a global entry point to collect further information with regards to small molecules (∼1 million), side-effects (∼110 000) or drug-target interactions (∼3 million). The drug-repositioning tab for small molecules will also suggest possible drug-repositioning opportunities to the user by using structural similarity measurements for small molecules using two different approaches. Additionally, using information from the Promiscuous 2.0 Database, lists of candidate drugs for given indications were precomputed, including a section dedicated to potential treatments for COVID-19. All the information is interconnected by a dynamic network-based visualization to identify new indications for available compounds. Promiscuous 2.0 is unique in its functionality and is publicly available at http://bioinformatics.charite.de/promiscuous2.


Subject(s)
Antiviral Agents/therapeutic use , COVID-19 Drug Treatment , Computational Biology/methods , Databases, Pharmaceutical , Drug Repositioning/statistics & numerical data , SARS-CoV-2/drug effects , COVID-19/epidemiology , COVID-19/virology , Data Curation/methods , Drug Repositioning/methods , Humans , Information Storage and Retrieval/methods , Internet , Pandemics , SARS-CoV-2/physiology
16.
Nucleic Acids Res ; 49(D1): D1046-D1057, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-939577

ABSTRACT

For more than two decades, the UCSC Genome Browser database (https://genome.ucsc.edu) has provided high-quality genomics data visualization and genome annotations to the research community. As the field of genomics grows and more data become available, new modes of display are required to accommodate new technologies. New features released this past year include a Hi-C heatmap display, a phased family trio display for VCF files, and various track visualization improvements. Striving to keep data up-to-date, new updates to gene annotations include GENCODE Genes, NCBI RefSeq Genes, and Ensembl Genes. New data tracks added for human and mouse genomes include the ENCODE registry of candidate cis-regulatory elements, promoters from the Eukaryotic Promoter Database, and NCBI RefSeq Select and Matched Annotation from NCBI and EMBL-EBI (MANE). Within weeks of learning about the outbreak of coronavirus, UCSC released a genome browser, with detailed annotation tracks, for the SARS-CoV-2 RNA reference assembly.


Subject(s)
COVID-19/prevention & control , Computational Biology/methods , Databases, Genetic , Genome/genetics , Genomics/methods , SARS-CoV-2/genetics , Animals , COVID-19/epidemiology , COVID-19/virology , Data Curation/methods , Epidemics , Humans , Internet , Mice , Molecular Sequence Annotation/methods , SARS-CoV-2/physiology , Software
17.
Sci Data ; 7(1): 405, 2020 11 16.
Article in English | MEDLINE | ID: covidwho-926927

ABSTRACT

Management of the COVID-19 pandemic has proven to be a significant challenge to policy makers. This is in large part due to uneven reporting and the absence of open-access visualization tools to present local trends and infer healthcare needs. Here we report the development of CovidCounties.org, an interactive web application that depicts daily disease trends at the level of US counties using time series plots and maps. This application is accompanied by a manually curated dataset that catalogs all major public policy actions made at the state-level, as well as technical validation of the primary data. Finally, the underlying code for the site is also provided as open source, enabling others to validate and learn from this work.


Subject(s)
Coronavirus Infections/epidemiology , Pneumonia, Viral/epidemiology , Software , Betacoronavirus , COVID-19 , Data Curation/methods , Datasets as Topic , Humans , Internet , Pandemics , SARS-CoV-2 , United States/epidemiology
18.
Nucleic Acids Res ; 49(D1): D1507-D1514, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-920714

ABSTRACT

Europe PMC (https://europepmc.org) is a database of research articles, including peer reviewed full text articles and abstracts, and preprints - all freely available for use via website, APIs and bulk download. This article outlines new developments since 2017 where work has focussed on three key areas: (i) Europe PMC has added to its core content to include life science preprint abstracts and a special collection of full text of COVID-19-related preprints. Europe PMC is unique as an aggregator of biomedical preprints alongside peer-reviewed articles, with over 180 000 preprints available to search. (ii) Europe PMC has significantly expanded its links to content related to the publications, such as links to Unpaywall, providing wider access to full text, preprint peer-review platforms, all major curated data resources in the life sciences, and experimental protocols. The redesigned Europe PMC website features the PubMed abstract and corresponding PMC full text merged into one article page; there is more evident and user-friendly navigation within articles and to related content, plus a figure browse feature. (iii) The expanded annotations platform offers ∼1.3 billion text mined biological terms and concepts sourced from 10 providers and over 40 global data resources.


Subject(s)
Biological Science Disciplines/statistics & numerical data , COVID-19/prevention & control , Data Curation/statistics & numerical data , Data Mining/statistics & numerical data , Databases, Factual/statistics & numerical data , PubMed , SARS-CoV-2/isolation & purification , Biological Science Disciplines/methods , Biomedical Research/methods , Biomedical Research/statistics & numerical data , COVID-19/epidemiology , COVID-19/virology , Data Curation/methods , Data Mining/methods , Epidemics , Europe , Humans , Internet , SARS-CoV-2/physiology
19.
Nucleic Acids Res ; 49(D1): D817-D824, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-851820

ABSTRACT

ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf_gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.


Subject(s)
COVID-19/prevention & control , Computational Biology/methods , Data Curation/methods , Databases, Genetic , Genome, Viral/genetics , SARS-CoV-2/genetics , COVID-19/epidemiology , COVID-19/virology , Genetic Variation , Humans , Information Storage and Retrieval/methods , Internet , Pandemics , SARS-CoV-2/physiology , User-Computer Interface
20.
J Med Libr Assoc ; 108(4): 656-662, 2020 Oct 01.
Article in English | MEDLINE | ID: covidwho-814784

ABSTRACT

Since January 30, 2020, when the World Health Organization declared the SARS CoV-2 disease (COVID-19) to be a public health emergency of international concern, the National Library of Medicine's (NLM's) Web Collecting and Archiving Working Group has been collecting a broad range of web-based content about the emerging pandemic for preservation in an Internet archive. Like NLM's other Global Health Events web collections, this content will have enduring value as a multifaceted historical record for future study and understanding of this event. This article describes the scope of the COVID-19 project; some of the content captured from websites, blogs, and social media; collecting criteria and methods; and related COVID-19 collecting efforts by other groups. The growing collection-2,500 items as of June 30, 2020-chronicles the many facets of the pandemic: epidemiology; vaccine and drug research; disease control measures and resistance to them; effects of the pandemic on health care institutions and workers, education, commerce, and many aspects of social life; effects for especially vulnerable groups; role of health disparities in infection and mortality; and recognition of racism as a public health emergency.


Subject(s)
Archives , Coronavirus Infections/epidemiology , Data Curation , National Library of Medicine (U.S.) , Pneumonia, Viral/epidemiology , Betacoronavirus , COVID-19 , Data Collection , Global Health , Humans , Pandemics , Quality Control , SARS-CoV-2 , United States
SELECTION OF CITATIONS
SEARCH DETAIL